AITopics | non-crossing quantile regression

Non-Crossing Quantile Regression for Distributional Reinforcement Learning

Neural Information Processing SystemsDec-24-2025, 12:05:27 GMT

Distributional reinforcement learning (DRL) estimates the distribution over future returns instead of the mean to more efficiently capture the intrinsic uncertainty of MDPs. However, batch-based DRL algorithms cannot guarantee the non-decreasing property of learned quantile curves especially at the early training stage, leading to abnormal distribution estimates and reduced model interpretability. To address these issues, we introduce a general DRL framework by using non-crossing quantile regression to ensure the monotonicity constraint within each sampled batch, which can be incorporated with any well-known DRL algorithm. We demonstrate the validity of our method from both the theory and model implementation perspectives. Experiments on Atari 2600 Games show that some state-of-art DRL algorithms with the non-crossing modification can significantly outperform their baselines in terms of faster convergence speeds and better testing performance. In particular, our method can effectively recover the distribution information and thus dramatically increase the exploration efficiency when the reward space is extremely sparse.

distributional reinforcement learning, name change, non-crossing quantile regression, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.31)

Add feedback

Supplements of " Non-crossing quantile regression in deep reinforcement learning "

Neural Information Processing SystemsAug-22-2025, 00:42:54 GMT

We first introduce the following Lemma, which is used to complete the proof of Lemma 1. Lemma. Consider an MDP with countable state and action spaces. Therefore, the inequality (4) holds, which completes the proof.Now we give the proof of Lemma 1. Lemma 1. The proof is similar to the argument of that of Proposition 2 of [1]. We assume that instantaneous rewards given a state-action pair are deterministic, and the general case is a straight-forward generalization with the regular probability argument.

lemma 1, quantile, state-action pair, (12 more...)

Neural Information Processing Systems

Country:

North America > Canada (0.04)
Asia > China > Shanghai > Shanghai (0.04)

Industry: Leisure & Entertainment (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.86)

Add feedback

Review for NeurIPS paper: Non-Crossing Quantile Regression for Distributional Reinforcement Learning

Neural Information Processing SystemsJan-27-2025, 20:55:50 GMT

Weaknesses: - Baseline algorithm: While all quantile-based distributional RL algorithms suffer from the crossing quantile issue, QR-DQN is the least affected one since the quantiles are uniformly fixed. IQN[1], which uses randomly sampled quantiles, and FQF[2], which optimizes over chosen quantiles for better distribution approximation, are both expected to suffer much more from crossing quantiles than QR-DQN. While it may be non-trivial to adapt NC architecture to IQN since the quantiles are randommly sampled, it shouldn't be hard to adapt to FQF. Besides, IQN and FQF both have achieved much higher scores than QR-DQN, hence I believe implementing NC architecture on IQN and FQF would greatly strenghthen empirical validations. Can authors explain why only 49 out of 57 games are used for evaluation? - Number of quantiles: I believe that N 100 quantiles is a reasonable choice.

distributional reinforcement learning, non-crossing quantile regression, quantile, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.78)

Add feedback

Review for NeurIPS paper: Non-Crossing Quantile Regression for Distributional Reinforcement Learning

Neural Information Processing SystemsJan-27-2025, 20:55:43 GMT

The strong rebuttal with additional results on NC-IQN swayed multiple initially hesitant reviewers to argue for acceptance, and I concur. The one unresolved concern is about reproducing the baseline results more accurately: I assume this is a matter of codebase/implementation details that does not detract from fair head-to-head comparisons.

distributional reinforcement learning, neurips paper, non-crossing quantile regression

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.40)

Add feedback

Non-Crossing Quantile Regression for Distributional Reinforcement Learning

Neural Information Processing SystemsOct-11-2024, 04:38:08 GMT

Distributional reinforcement learning (DRL) estimates the distribution over future returns instead of the mean to more efficiently capture the intrinsic uncertainty of MDPs. However, batch-based DRL algorithms cannot guarantee the non-decreasing property of learned quantile curves especially at the early training stage, leading to abnormal distribution estimates and reduced model interpretability. To address these issues, we introduce a general DRL framework by using non-crossing quantile regression to ensure the monotonicity constraint within each sampled batch, which can be incorporated with any well-known DRL algorithm. We demonstrate the validity of our method from both the theory and model implementation perspectives. Experiments on Atari 2600 Games show that some state-of-art DRL algorithms with the non-crossing modification can significantly outperform their baselines in terms of faster convergence speeds and better testing performance. In particular, our method can effectively recover the distribution information and thus dramatically increase the exploration efficiency when the reward space is extremely sparse.

distributional reinforcement learning, drl algorithm, non-crossing quantile regression

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.66)

Add feedback

Filters

Collaborating Authors

non-crossing quantile regression

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Non-Crossing Quantile Regression for Distributional Reinforcement Learning

Supplements of " Non-crossing quantile regression in deep reinforcement learning "

Review for NeurIPS paper: Non-Crossing Quantile Regression for Distributional Reinforcement Learning

Review for NeurIPS paper: Non-Crossing Quantile Regression for Distributional Reinforcement Learning

Non-Crossing Quantile Regression for Distributional Reinforcement Learning